xen.git
16 years agoblktap2: Check gcrypt library has MD5() function.
Keir Fraser [Fri, 9 Oct 2009 07:55:43 +0000 (08:55 +0100)]
blktap2: Check gcrypt library has MD5() function.

From: Dulloor <dulloor@gmail.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoFix the IA64 build of the hypervisor.
Keir Fraser [Fri, 9 Oct 2009 07:54:25 +0000 (08:54 +0100)]
Fix the IA64 build of the hypervisor.

This is completely untested, beyond confirming that it compiles.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agoxend: Fix bug in superpage flag handling
Keir Fraser [Fri, 9 Oct 2009 07:53:42 +0000 (08:53 +0100)]
xend: Fix bug in superpage flag handling

During testing I discovered that using a bootloader magically clears
the superpage flag out of the config.  This small patch fixes that
behavior.

From: Dave McCracken <dcm@mccr.org>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoObtain Linux kernels via git-over-http
Keir Fraser [Thu, 8 Oct 2009 08:24:32 +0000 (09:24 +0100)]
Obtain Linux kernels via git-over-http

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 tsc: Fix check_tsc_warp() bug and add copyright notice
Keir Fraser [Thu, 8 Oct 2009 07:51:51 +0000 (08:51 +0100)]
x86 tsc: Fix check_tsc_warp() bug and add copyright notice

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxsm: Correct the usage of XSM_ENABLE after c/s 20285.
Keir Fraser [Thu, 8 Oct 2009 07:48:52 +0000 (08:48 +0100)]
xsm: Correct the usage of XSM_ENABLE after c/s 20285.

Signed-off-by : Machon Gregory <mbgrego@tycho.ncsc.mil>

16 years agoUpdate QEMU_TAG to a05958b6e32f1748ea70b1efca13394956c0698b
Keir Fraser [Thu, 8 Oct 2009 07:47:11 +0000 (08:47 +0100)]
Update QEMU_TAG to a05958b6e32f1748ea70b1efca13394956c0698b

16 years agox86 shadow: Fix the build.
Keir Fraser [Wed, 7 Oct 2009 15:29:03 +0000 (16:29 +0100)]
x86 shadow: Fix the build.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agolibxenctrl: Fix non-Linux definitions of xc_gnttab_map_table*().
Keir Fraser [Wed, 7 Oct 2009 15:10:19 +0000 (16:10 +0100)]
libxenctrl: Fix non-Linux definitions of xc_gnttab_map_table*().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoFix hypervisor crash with unpopulated NUMA nodes
Keir Fraser [Wed, 7 Oct 2009 14:58:26 +0000 (15:58 +0100)]
Fix hypervisor crash with unpopulated NUMA nodes

On NUMA systems with memory-less nodes Xen crashes quite early in the
hypervisor (while initializing the heaps). This is not an issue if
this happens to be the last node, but "inner" nodes trigger this
reliably.  On multi-node processors it is much more likely to leave a
node unequipped.  The attached patch fixes this by enumerating the
node via the node_online_map instead of counting from 0 to num_nodes.

The resulting NUMA setup is still somewhat strange, but at least it
does not crash. In lowlevel/xc/xc.c there is again this enumeration
bug, but I suppose we cannot access the HV's node_online_map from this
context, so the xm info output is not correct (but xm debug-keys H
is).  I plan to rework the handling of memory-less nodes later.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
16 years agox86 shadow: fix the check for having killed the guest in the fault handler.
Keir Fraser [Wed, 7 Oct 2009 14:56:05 +0000 (15:56 +0100)]
x86 shadow: fix the check for having killed the guest in the fault handler.

We care only about when we have called domain_crash() (and therefore
shadow invariants may not hold) and shouldn't spuriously inject
pagefaults into guests that are shutting down for other reasons.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agolibxenctrl: Build fix after gnttab_v2 changes.
Keir Fraser [Wed, 7 Oct 2009 09:26:39 +0000 (10:26 +0100)]
libxenctrl: Build fix after gnttab_v2 changes.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoPVUSB: Update public header.
Keir Fraser [Wed, 7 Oct 2009 07:42:50 +0000 (08:42 +0100)]
PVUSB: Update public header.

Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
16 years agox86 hvm: On failed hvm_send_assist_req(), io emulation state should be
Keir Fraser [Wed, 7 Oct 2009 07:07:06 +0000 (08:07 +0100)]
x86 hvm: On failed hvm_send_assist_req(), io emulation state should be
reset to HVMIO_none, as no IO is in flight.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoScattered code arrangement cleanups.
Keir Fraser [Wed, 7 Oct 2009 06:50:20 +0000 (07:50 +0100)]
Scattered code arrangement cleanups.

- remove redundant declarations
- add/move prototypes to headers
- move things where they belong to

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agoTools-side support for creating and destroying netchannel2 interfaces.
Keir Fraser [Wed, 7 Oct 2009 06:47:50 +0000 (07:47 +0100)]
Tools-side support for creating and destroying netchannel2 interfaces.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agoTransitive grant support.
Keir Fraser [Wed, 7 Oct 2009 06:47:21 +0000 (07:47 +0100)]
Transitive grant support.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agoImplement sub-page grant support.
Keir Fraser [Wed, 7 Oct 2009 06:46:59 +0000 (07:46 +0100)]
Implement sub-page grant support.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agoIntroduce a grant_entry_v2 structure.
Keir Fraser [Wed, 7 Oct 2009 06:46:36 +0000 (07:46 +0100)]
Introduce a grant_entry_v2 structure.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agoRename the struct grant_entry to struct grant_entry_v1, so that it
Keir Fraser [Wed, 7 Oct 2009 06:46:14 +0000 (07:46 +0100)]
Rename the struct grant_entry to struct grant_entry_v1, so that it
isn't in the way when we introduce struct grant_entry_v2.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agoOptimize memcpy for x86 arch. If source buffers does not start at a 64
Keir Fraser [Wed, 7 Oct 2009 06:45:39 +0000 (07:45 +0100)]
Optimize memcpy for x86 arch. If source buffers does not start at a 64
bit boundary, copy a few bytes at the beginnig up to next 64-bit
boundary and then does an aligned copy for the remaining data. This
can reduce the copy cost by up to 50%.

Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com>
16 years agoSlightly more accurate dependency tracking for the .c and .h files in
Keir Fraser [Wed, 7 Oct 2009 06:45:14 +0000 (07:45 +0100)]
Slightly more accurate dependency tracking for the .c and .h files in
include/compat.  They should depend on the scripts which generate
them, as well as the inputs to those scripts.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agoSimplify include/xen/grant_table.h a bit:
Keir Fraser [Wed, 7 Oct 2009 06:44:50 +0000 (07:44 +0100)]
Simplify include/xen/grant_table.h a bit:

-- INITIAL_GRANT_ENTRIES is never used, so can be removed.
-- Simplify num_act_frames_from_sha_frames a little.

Signed-off-by: Steven Smith <steven.smith@citrix.com>
16 years agox86 vtsc: use debug-key to check/test reliable tsc
Keir Fraser [Wed, 7 Oct 2009 06:43:50 +0000 (07:43 +0100)]
x86 vtsc: use debug-key to check/test reliable tsc

Previous attempt was rejected as too intrusive, but
further app rdtsc optimization work is very dependent
on Xen being able to determine if TSC is reliable
or not.

This patch starts to introduce the concept of
X86_FEATURE_TSC_RELIABLE as it is defined and
used by Linux, but uses it and tests it only in
a debug-key for now, so that a wide variety of
hardware can be measured by the broader Xen
community to confirm/deny TSC assumptions.
The eventual goal is for the evaluation of
TSC reliability to be exported to userland
so that apps can use rdtsc natively if and when
it is safe to do so.

(See http://lists.xensource.com/archives/html/xen-devel/2009-10/msg00056.html)

Note that the original Linux code for tsc_sync.c
uses a raw spinlock to ensure the "fastest, inlined,
non-debug version of a critical section".  Xen
doesn't provide a _raw_spin_lock() so I used
regular spinlocks, but I would prefer the code
to use something more strict as Linux does.

(Also includes a minor nit: "NOSTOP" was used in
an early version of a Linux patch, but mainline
now uses "NONSTOP"... correct this for consistency.)

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agohvm, vtsc: missing vtsc counter for hvm guests
Keir Fraser [Wed, 7 Oct 2009 06:35:06 +0000 (07:35 +0100)]
hvm, vtsc: missing vtsc counter for hvm guests

This counter line got dropped somewhere along the way.
Confused me a bit because the count was always being
reported as 0.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agox86 hvm: Do not incorrectly retire an instruction emulation when a
Keir Fraser [Wed, 7 Oct 2009 06:21:31 +0000 (07:21 +0100)]
x86 hvm: Do not incorrectly retire an instruction emulation when a
read/write cycle to qemu is dropped due to guest suspend.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: Updated setpolicy to support loading of flask policy
Keir Fraser [Tue, 6 Oct 2009 09:11:53 +0000 (10:11 +0100)]
xend: Updated setpolicy to support loading of flask policy

Updated xm setpolicy command to support the loading of flask security
policies.

Signed-off-by : Machon Gregory <mbgrego@tycho.ncsc.mil>
Signed-off-by : George S. Coker, II <gscoker@alpha.ncsc.mil>

16 years ago[VTD] don't enable device ATS if root port does not support it
Keir Fraser [Tue, 6 Oct 2009 09:11:14 +0000 (10:11 +0100)]
[VTD] don't enable device ATS if root port does not support it

Fixed a bug in the code that enables ATS capability on the device even
when root port does not support it.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
16 years agox86: Emulated TSC should run at same (1GHz) rate in guest kernel and apps.
Keir Fraser [Tue, 6 Oct 2009 09:09:21 +0000 (10:09 +0100)]
x86: Emulated TSC should run at same (1GHz) rate in guest kernel and apps.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agocredit scheduler: fix credits overflow
Keir Fraser [Fri, 2 Oct 2009 08:10:27 +0000 (09:10 +0100)]
credit scheduler: fix credits overflow

In changing credits-per-tick from 100 to 1000000, a possible overflow
was introduced in the accounting algorithm, when credit totals (which
can be in the millions) gets multiplied by a weight (typically 256):
th eresult can easily overflow a signed 32-bit variable.

Fix this by reverting to 100 credits per tick, and maintain long-term
fairness/correctness by tracking at the nanosecond level exactly how
much execution time has been accounted to each VCPU. We do this by
rounding execution time so far to nearest number of credits, but then
remember the VCPU's 'partial credit balance'.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoFix recursive lock p2m lock acquisition in POD code
Keir Fraser [Thu, 1 Oct 2009 11:29:33 +0000 (12:29 +0100)]
Fix recursive lock p2m lock acquisition in POD code

The POD code can take the p2m lock from inside a lookup.  This causes
a crash if anyone calls gfn_to_mfn* with the p2m lock held, which is
quite a few places.  Make the POD code understand that it may be
called with the lock held, and DTRT about talking or releasing it.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agostubdom/minios: re-structure headers
Keir Fraser [Thu, 1 Oct 2009 11:28:54 +0000 (12:28 +0100)]
stubdom/minios: re-structure headers

As part of making stubdom usable on NetBSD, it is necessary to
restructure the minios headers to avoid conflicts with NetBSD's
crossbuild toolchain.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
16 years agoVNIF: Using smart polling instead of event notification.
Keir Fraser [Thu, 1 Oct 2009 11:27:01 +0000 (12:27 +0100)]
VNIF: Using smart polling instead of event notification.

Patch the Xen version of ring.h

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
16 years agoFix memory leak in libxenstore python bindings
Keir Fraser [Thu, 1 Oct 2009 11:26:15 +0000 (12:26 +0100)]
Fix memory leak in libxenstore python bindings

Temporary tuple0 python object was not freed at the end of
xspy_set_permissions() in case no error occurred. To reduce code
duplication, this path reuses the cleanup code.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
16 years agoDisable HPET broadcast mode on kexec.
Keir Fraser [Thu, 1 Oct 2009 11:25:36 +0000 (12:25 +0100)]
Disable HPET broadcast mode on kexec.

Without this the new kernel cannot receive timer interrupts from the
legacy sources. Hangs are observed in the second kernel's
"check_timer()" routing or at "Checking 'hlt' instruction."

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
16 years agoxend: allow domain creation with multiple empty CD-ROM devices
Keir Fraser [Wed, 30 Sep 2009 07:51:21 +0000 (08:51 +0100)]
xend: allow domain creation with multiple empty CD-ROM devices

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: Fix memory leaks in libxc python bindings
Keir Fraser [Wed, 30 Sep 2009 07:44:57 +0000 (08:44 +0100)]
xend: Fix memory leaks in libxc python bindings

Reference counters are not correctly decreased for python object in
several places in python bindings for libxc. Most of them are around
PyList_Append(), which unlike PyList_SetItem() does increment
reference counter of the object being added to a list.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
16 years agoCleanup: Make local functions static and remove unused functions.
Keir Fraser [Wed, 30 Sep 2009 07:43:34 +0000 (08:43 +0100)]
Cleanup: Make local functions static and remove unused functions.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agosvm: a few cleanups
Keir Fraser [Tue, 29 Sep 2009 10:28:33 +0000 (11:28 +0100)]
svm: a few cleanups

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agovmx: add the support of XSAVE/XRSTOR to VMX guest
Keir Fraser [Tue, 29 Sep 2009 10:27:53 +0000 (11:27 +0100)]
vmx: add the support of XSAVE/XRSTOR to VMX guest

XSAVE/XRSTOR manages the existing and future processor extended states
on x86 architecture.

The XSAVE/XRSTOR infrastructure is defined in Intel SDMs:
http://www.intel.com/products/processor/manuals/

The patch uses the classical CR0.TS based algorithm to manage the
states on context switch.  At present, we know 3 bits in the
XFEATURE_ENABLED_MASK: FPU, SSE and YMM.  YMM is defined in Intel AVX
Programming Reference: http://software.intel.com/sites/avx/

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agomce: Support machine check logging left over from previous reset.
Keir Fraser [Tue, 29 Sep 2009 10:22:17 +0000 (11:22 +0100)]
mce: Support machine check logging left over from previous reset.

Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
16 years agoxend: Fix save/restore after previous changeset.
Keir Fraser [Mon, 28 Sep 2009 12:59:35 +0000 (13:59 +0100)]
xend: Fix save/restore after previous changeset.

Platform variable 'tsc_native' is saved/restored as a string, so must
be converted to an integer before passing to domain_set_tsc_native().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Allow TSC mode (emulate vs native) to be configured per domain.
Keir Fraser [Mon, 28 Sep 2009 09:01:10 +0000 (10:01 +0100)]
x86: Allow TSC mode (emulate vs native) to be configured per domain.

The default is to emulate. Old saved images will be restored with
legacy behaviour however (native TSC, no emulation).

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAMD IOMMU: Fix boot output on non-iommu system
Keir Fraser [Mon, 28 Sep 2009 07:28:26 +0000 (08:28 +0100)]
AMD IOMMU: Fix boot output on non-iommu system

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agox86 hvm: *really* fix missing ticks bug of c/s 20218
Keir Fraser [Fri, 25 Sep 2009 14:20:58 +0000 (15:20 +0100)]
x86 hvm: *really* fix missing ticks bug of c/s 20218

With c/s 20218, timer ticks might be missed when IRQs of a timer are
queued. "Next scheduled time" is accumulated wrongly.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoia64: Fix build for xen/ia64
Keir Fraser [Fri, 25 Sep 2009 14:12:45 +0000 (15:12 +0100)]
ia64: Fix build for xen/ia64

Define the related dummy functions and move the macros
as public to fix the build issue.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agox86: Make assigned devices' interrupts delivery to right vcpu.
Keir Fraser [Fri, 25 Sep 2009 09:50:18 +0000 (10:50 +0100)]
x86: Make assigned devices' interrupts delivery to right vcpu.

This patch targets for reducing IPIs when delivery VT-d's devices'
intrs to target vcpus.  According to the experiments leveraging 10G
Oplin NIC card, CPU utilization can reduce 5%-6% and NIC's bandwidth
keeps unchanged through testings.  And this patch can always benefit
UP guests with MSI-capable devices assigned and SMP guests whose
lapic's destination mode is physical mode.  And also it can benefit
SMP guests whose lapic's dest_mode is logical mode but only one
destination is specified.  So it should cover major cases in real
environment. Currenlty, this patch intercepts the programming for MSI
intr status, and caculate the destination id for the pirq when do the
programming in advance.  When vcpu migratio n occurs or guest
re-programe MSI status, it checks that whether needs to set the
corresponding pirq's affinity of assigned devices and make vcpu's
affinity and pirq's consistent to reduce the IPIs eventually.

Signed-off-by : Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Xiaohui Xin <xiaohui.xin@intel.com>
16 years agoPoD: Allocate 4k pages if 2 meg allocation fails
Keir Fraser [Fri, 25 Sep 2009 09:47:36 +0000 (10:47 +0100)]
PoD: Allocate 4k pages if 2 meg allocation fails

In p2m_pod_set_cache_target:
 * If a 2-meg allocation fails, try a 4k allocation
 * If both allocations fail, return -ENOMEM so that the domain build
   will fail.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agoxend: Unlink VDI instances and VBD instances
Keir Fraser [Fri, 25 Sep 2009 09:46:13 +0000 (10:46 +0100)]
xend: Unlink VDI instances and VBD instances

VBD information in xend does not have a VDI value if XenAPI mode
is invalid. A new patch confirms the VDI value is valid.  So, the new
patch cuts off VDI->VBD links if the VDI value is valid.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86, mce: Control Machine Check Log output verbosity
Keir Fraser [Fri, 25 Sep 2009 09:45:02 +0000 (10:45 +0100)]
x86, mce: Control Machine Check Log output verbosity

This small patch is for controlling machine check related Xen log
output. When set mce_verbosity=verbose in command line, all mce
related logs will be printed. Otherwise, those logs will be
eliminated.

Signed-off-by: Liping Ke <liping.ke@intel.com>
16 years agox86: minor cleanup of code that writes to TSC
Keir Fraser [Wed, 23 Sep 2009 17:19:30 +0000 (18:19 +0100)]
x86: minor cleanup of code that writes to TSC

While working on TSC-handling code, I missed an important
piece of code that writes to TSC because it does it
differently than other pieces of code.  Fix that,
and also cleanup a bit to avoid hardcoded constants
and use wrmsrl instead of wrmsr plus handwritten 64-bit
dismembering code.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoUpdate QEMU_TAG to 743edef44f1d0da792aeb38a33bf468a4596f730
Keir Fraser [Wed, 23 Sep 2009 17:18:29 +0000 (18:18 +0100)]
Update QEMU_TAG to 743edef44f1d0da792aeb38a33bf468a4596f730

16 years agoEPT: Assert p2m is locked in ept_sync_domain().
Keir Fraser [Tue, 22 Sep 2009 13:19:38 +0000 (14:19 +0100)]
EPT: Assert p2m is locked in ept_sync_domain().
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agox86: Support more than 256 pins of ioapic.
Keir Fraser [Tue, 22 Sep 2009 13:18:51 +0000 (14:18 +0100)]
x86: Support more than 256 pins of ioapic.

Some large system may have many ioapics which
have more than 256 pins totally. To support this
case, just let pirq == irq and build 1:1 mapping
between them, and this is based on the assumpation
that pirq == GSI number in dom0 for iopaic IRQs.

Thank Jan Beulich from Novell for reporting the issue
in pv_ops dom0.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agox86: Fix the build.
Keir Fraser [Tue, 22 Sep 2009 13:11:09 +0000 (14:11 +0100)]
x86: Fix the build.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoEPT: More efficient ept_sync_domain().
Keir Fraser [Tue, 22 Sep 2009 08:18:25 +0000 (09:18 +0100)]
EPT: More efficient ept_sync_domain().

Rather than always flushing all CPUs, only flush CPUs this domain is
currently active on, and defer flushing other CPUs until this domain
is scheduled onto them (or the domain is destroyed).

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agomca: Fix several issues for MCA UCR error handling
Keir Fraser [Tue, 22 Sep 2009 07:37:32 +0000 (08:37 +0100)]
mca: Fix several issues for MCA UCR error handling

This patch is for fixing several issues for MCA UCR error handling on
latest Intel platforms, including:
1) For UCR error, the  is 0xC0 ~ 0xCF instead of just C0
2) Synchronization issues for clearing error finding flag and clearing
global MCIP flag. Otherwise, in some cases, MCIP flag can't be cleared.

Signed-off-by: Liping Ke <liping.ke@intel.com>
16 years agotboot: fix tboot memory mapping for 32b
Keir Fraser [Tue, 22 Sep 2009 07:36:40 +0000 (08:36 +0100)]
tboot: fix tboot memory mapping for 32b

This patch used fixmap to get TXT heap base/size and SINIT base/size
from TXT pub config registers (whose address starts from 0xfed20000),
and get DMAR table copy from TXT heap (whose address may start from
0x7d520000) for tboot, instead of using map_pages_to_xen(), which will
cause panic on x86_32.

Signed-off-by: Shane Wang <shane.wang@intel.com>
16 years agox86: allow IRQs to work without APIC again
Keir Fraser [Tue, 22 Sep 2009 07:28:26 +0000 (08:28 +0100)]
x86: allow IRQs to work without APIC again

Non-IO-APIC IRQs must get 1:1 mapped between domain PIRQ and Xen IRQ.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoImprove CSE in grant table code
Keir Fraser [Tue, 22 Sep 2009 07:27:10 +0000 (08:27 +0100)]
Improve CSE in grant table code

The grant table code had some particularly frequent repetitions of
mfn_to_page() on each time the same input arguments. To help the
compiler (which can do only a limited job on CSE), this adds explicit
caching of the transformation result in a few places.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoIntroduce new flavour of map_domain_page()
Keir Fraser [Tue, 22 Sep 2009 07:26:16 +0000 (08:26 +0100)]
Introduce new flavour of map_domain_page()

Introduce a variant of map_domain_page() directly getting passed a
struct page_info * argument, based on the observation that in many
places the argument to this function so far simply was the result of
page_to_mfn(). This is meaningful for the x86-64 case where
map_domain_page() really just is an invocation of mfn_to_virt(), and
hence the combined mfn_to_virt(page_to_mfn()) now represents a
needless round trip conversion compressed -> uncompressed ->
compressed of the MFN representation.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: map M2P table sparsely
Keir Fraser [Tue, 22 Sep 2009 07:19:16 +0000 (08:19 +0100)]
x86: map M2P table sparsely

Avoid backing M2P table holes with memory, when those holes are large
enough to cover an exact multiple of large pages.

For the sake of saving and migrating guests, XENMEM_machphys_mfn_list
fills the holes in the array it returns with the MFN for the previous
range returned (thanks to Keir pointing out that it really doesn't
matter *what* MFN gets returned for invalid ranges). Using the most
recently encountered MFN (rather than e.g. always the first one)
represents an attempt to cut down on the number of references these
pages will get when they get mapped into a privileged domain's address
space.

This also allows for saving a couple of 2M pages even on certain
"normal" systems.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: map frame table sparsely
Keir Fraser [Tue, 22 Sep 2009 07:18:19 +0000 (08:18 +0100)]
x86: map frame table sparsely

Avoid backing frame table holes with memory, when those holes are
large enough to cover an exact multiple of large pages. This is based
on the introduction of a bit map, where each bit represents one such
range, thus allowing mfn_valid() checks to easily filter out those
MFNs that now shouldn't be used to index the frame table.

This allows for saving a couple of 2M pages even on "normal" systems.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86-64: reduce range spanned by 1:1 mapping and frame table indexes
Keir Fraser [Tue, 22 Sep 2009 07:16:49 +0000 (08:16 +0100)]
x86-64: reduce range spanned by 1:1 mapping and frame table indexes

Introduces a virtual space conserving transformation on the MFN thus
far used to index 1:1 mapping and frame table, removing the largest
range of contiguous bits (below the most significant one) which are
zero for all valid MFNs from the MFN representation, to be used to
index into those arrays, thereby cutting the virtual range these
tables must cover approximately by half with each bit removed.

Since this should account for hotpluggable memory (in order to not
requiring a re-write when that gets supported), the determination of
which bits are candidates for removal must not be based on the E820
information, but instead has to use the SRAT. That in turn requires a
change to the ordering of steps done during early boot.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86-64: extend manageable memory range to 5Tb
Keir Fraser [Tue, 22 Sep 2009 07:14:48 +0000 (08:14 +0100)]
x86-64: extend manageable memory range to 5Tb

Extend the virtual range reserved for the 1:1 mapping to cover 5Tb,
and make the virtual size of the frame table gets match whatever the
1:1 table can cover.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agovpmu_core2: support newer processors
Keir Fraser [Tue, 22 Sep 2009 07:06:14 +0000 (08:06 +0100)]
vpmu_core2: support newer processors

Add code to get fully virtualized performance counters with newer
processors (which I'am able to test!) The most stuff is to check for
reserved bits in the control and counter register.

Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
16 years agox86 hvm: small cleanup in vpmu
Keir Fraser [Tue, 22 Sep 2009 07:04:58 +0000 (08:04 +0100)]
x86 hvm: small cleanup in vpmu

Replace the special vpmu define LVTPC_HVM_PMU with the global
used define PMU_APIC_VECTOR to avoid different names for the
same thing.

Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
16 years agopv_ops: Build xen/master branch rather than xen-tip/master
Keir Fraser [Tue, 22 Sep 2009 07:02:50 +0000 (08:02 +0100)]
pv_ops: Build xen/master branch rather than xen-tip/master

16 years agoUpdate QEMU_TAG to f09a5ba89434bb3f28172640354258d1d6cd8579
Keir Fraser [Tue, 22 Sep 2009 07:02:01 +0000 (08:02 +0100)]
Update QEMU_TAG to f09a5ba89434bb3f28172640354258d1d6cd8579

16 years agox86: Fix memory leak in mce_wrmsr
Keir Fraser [Tue, 22 Sep 2009 07:01:06 +0000 (08:01 +0100)]
x86: Fix memory leak in mce_wrmsr

Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
16 years agox86 hvm: fix missing ticks bug of c/s 20218
Keir Fraser [Tue, 22 Sep 2009 07:00:36 +0000 (08:00 +0100)]
x86 hvm: fix missing ticks bug of c/s 20218

With c/s 20218, timer ticks might be missed when IRQs of a timer are
queued. "Next scheduled time" is accumulated wrongly.

Thanks to Christoph for the report.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Reported-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agoRevert 20221:fc94d586d02f
Keir Fraser [Fri, 18 Sep 2009 13:45:40 +0000 (14:45 +0100)]
Revert 20221:fc94d586d02f

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoiommu: Fix pirq conflict issue when guest adopts per-cpu vector.
Keir Fraser [Fri, 18 Sep 2009 07:46:32 +0000 (08:46 +0100)]
iommu: Fix pirq conflict issue when guest adopts per-cpu vector.

Latest Linux and Windows may adopt per-cpu vector instead of global
vector, so same vector in different vcpu may correspond to different
interrupt sources. That is to say, vector and pirq should be 1:n
mapping, and the array msi_gvec_pirq can't meet the mapping
requirement, so need to improve the related logic, otherwise it may
introduce strange issues.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoxend: Implement VIF.get_network
Keir Fraser [Fri, 18 Sep 2009 07:44:38 +0000 (08:44 +0100)]
xend: Implement VIF.get_network

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoAMD IOMMU: Extend the loop counter for polling completion wait bit.
Keir Fraser [Fri, 18 Sep 2009 07:29:46 +0000 (08:29 +0100)]
AMD IOMMU: Extend the loop counter for polling completion wait bit.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agoAMD IOMMU: Remove unused definitions.
Keir Fraser [Fri, 18 Sep 2009 07:29:19 +0000 (08:29 +0100)]
AMD IOMMU: Remove unused definitions.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agoAMD IOMMU: If interrupt remapping is disabled, then do not update
Keir Fraser [Fri, 18 Sep 2009 07:28:52 +0000 (08:28 +0100)]
AMD IOMMU: If interrupt remapping is disabled, then do not update
interrupt remapping table with IOAPIC write.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agoAMD IOMMU: Allow enabling iommu debug output at run time.
Keir Fraser [Fri, 18 Sep 2009 07:28:20 +0000 (08:28 +0100)]
AMD IOMMU: Allow enabling iommu debug output at run time.

The old compile-time option is removed.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agoxend: Unlink VDI instances and VBD instances
Keir Fraser [Fri, 18 Sep 2009 07:27:38 +0000 (08:27 +0100)]
xend: Unlink VDI instances and VBD instances

When VBD instances are destroyed by xm delete command, VDI
instances keep linking to the VBD instances unilaterally.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoRevert 20194:582970a2d2dc
Keir Fraser [Fri, 18 Sep 2009 07:26:53 +0000 (08:26 +0100)]
Revert 20194:582970a2d2dc

Excessively slows down domain creation in debug builds.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 hvm: suspend platform timer emulation while its IRQ is masked
Keir Fraser [Wed, 16 Sep 2009 08:30:41 +0000 (09:30 +0100)]
x86 hvm: suspend platform timer emulation while its IRQ is masked

This patch gets rid of a timer which IRQ is masked from vcpu's timer
list. It reduces the overhead of VM EXIT and context switch of vm.

Also fixes a potential bug.
(1) VCPU#0: mask the IRQ of a timer. (ex. vioapic.redir[2].mask=1)
(2) VCPU#1: pt_timer_fn() is invoked by expiration of the timer.
(3) VCPU#1: pt_update_irq() is called but does nothing by
pt_irq_masked()==1.
(4) VCPU#1: sleep by halt.
(5) VCPU#0: unmask the IRQ of the timer.
After that, no one wakes up the VCPU#1.

IRQ of ISA is masked by:
 - PIC's IMR
 - IOAPIC's redir[0]
 - IOAPIC's redir[N].mask
 - LAPIC's LVT0
 - LAPIC enabled/disabled

IRQ of LAPIC timer is masked by:
 - LAPIC's LVTT
 - LAPIC disabled

When above stuffs are changed, the corresponding vcpu is kicked and
suspended timer emulation is resumed.

In addition, a small bug fix in pt_adjust_global_vcpu_target().

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
16 years agox86 hvm: don't set periodical timer again until its IRQ is delivered.
Keir Fraser [Wed, 16 Sep 2009 08:29:17 +0000 (09:29 +0100)]
x86 hvm: don't set periodical timer again until its IRQ is delivered.

Modern Windows OS (ex XP,2003,2008) never use the PIT timer,
and neither cpu#0's LAPIC timer after boot.
Despite that, xen emulates them busily. It's inefficient.

With this patch, setting a timer is defered while its IRQ is masked.

The reasons why pt_timer_fn() simply calls vcpu_kick() are:
- checking by pt_irq_masked() is duplicated. pt_update_irq() also
does.
- pt_timer_fn() is likely called on the same processor
  as pt->vcpu->processor. Hence vcpu_kick() hardly send IPI.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
16 years agoRemove "buffer half full" check from guest_console_write
Keir Fraser [Wed, 16 Sep 2009 08:26:04 +0000 (09:26 +0100)]
Remove "buffer half full" check from guest_console_write

Checks are made at a lower level in the serial code, and teh policy
there is to drop rather than wait. So boot makes progress even when
serial hardware is problematic.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
16 years agoxend: Consider ioemu devices for inactive managed domains
Keir Fraser [Wed, 16 Sep 2009 08:22:38 +0000 (09:22 +0100)]
xend: Consider ioemu devices for inactive managed domains

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoAMD IOMMU: Rework of interrupt remapping
Keir Fraser [Wed, 16 Sep 2009 08:21:56 +0000 (09:21 +0100)]
AMD IOMMU: Rework of interrupt remapping

1) Parsing IVRS special device entry in order to handle ioapic
remapping correctly.
2) Allocating per-device interrupt remapping tables instead of using a
global interrupt remapping table.
3) Some system devices like io-apic for north-bridge cannot be
discovered during pci device enumeration procedure. To remap interrupt
of those devices, device table update is split into 2 steps, so
that interrupt tables can be bound to device table entry earlier than
I/O page tables.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agox86: irq ratelimit
Keir Fraser [Wed, 16 Sep 2009 08:16:38 +0000 (09:16 +0100)]
x86: irq ratelimit

This patch adds the feature of irq ratelimit. It temporarily masks
the interrupt (guest) if too many irqs are observed in a short
period (irq storm), to ensure responsiveness of Xen and other guests.

As for now, the threshold can be adjusted at boot time using command-
line option irq_ratelimit=xxx.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 hvm: Guests should scan CPUID range 40000000-4000ff00 for Xen leaves.
Keir Fraser [Wed, 16 Sep 2009 07:55:23 +0000 (08:55 +0100)]
x86 hvm: Guests should scan CPUID range 40000000-4000ff00 for Xen leaves.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxenoprof: force use of architectural perfmon instead of the CPU
Keir Fraser [Tue, 15 Sep 2009 09:08:12 +0000 (10:08 +0100)]
xenoprof: force use of architectural perfmon instead of the CPU
specific event set, which may be not supported by oprofile user space
tool yet.

Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agoxenoprof: support Intel's architectural perfmon registers.
Keir Fraser [Tue, 15 Sep 2009 09:03:16 +0000 (10:03 +0100)]
xenoprof: support Intel's architectural perfmon registers.

One benefit is that more perfmon counters can be used on Nehalem.

Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agoxenoprof: add support for Core i7 and Atom.
Keir Fraser [Tue, 15 Sep 2009 09:02:15 +0000 (10:02 +0100)]
xenoprof: add support for Core i7 and Atom.

Signed-off-by: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agox86: Free unused pages of per-cpu data.
Keir Fraser [Tue, 15 Sep 2009 08:54:16 +0000 (09:54 +0100)]
x86: Free unused pages of per-cpu data.

As well as freeing data pages for impossible cpus, we also free pages
of all other cpus which contain no actual data (because of too-large
statically-defined PERCPU_SHIFT).

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Re-increase size of percpu area
Keir Fraser [Tue, 15 Sep 2009 08:52:26 +0000 (09:52 +0100)]
x86: Re-increase size of percpu area

Per-cpu vector code add a lot of percpu data. Together with perfc
enabled, one page per cpu is not enough any more.

Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agop2m: Fix debug build.
Keir Fraser [Tue, 15 Sep 2009 08:46:08 +0000 (09:46 +0100)]
p2m: Fix debug build.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoUpdate QEMU_TAG to 3de6cb51b19c46967cbc88ceb202b240c736eeca
Keir Fraser [Tue, 15 Sep 2009 08:26:52 +0000 (09:26 +0100)]
Update QEMU_TAG to 3de6cb51b19c46967cbc88ceb202b240c736eeca

16 years agoxend: Fix VDI.get_record
Keir Fraser [Tue, 15 Sep 2009 08:26:08 +0000 (09:26 +0100)]
xend: Fix VDI.get_record

We cannot get correct records of VDI by VDI.get_record.
The correct records of VDI are gotten by this patch.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86 mce: Fix panic in mcheck_mca_logout
Keir Fraser [Tue, 15 Sep 2009 08:25:41 +0000 (09:25 +0100)]
x86 mce: Fix panic in mcheck_mca_logout

I met the following panic message in mcheck_mca_logout().
MSR_IA32_MCi_ADDR might take the values other than the machine
address. FATAL PAGE FAULT occured when the non-existent address is
passed to maddr_get_owner().

Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
16 years agoVt-d: queued invalidation cleanup
Keir Fraser [Tue, 15 Sep 2009 08:24:59 +0000 (09:24 +0100)]
Vt-d: queued invalidation cleanup

This patch cleans up queued invalidation, including round wrap
check, multiple polling status and other minor changes. This version
uses local variable as the polling address, which is clean.

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
16 years agox86: Remove PSE flag from PV guest CR4 and CPUID.
Keir Fraser [Tue, 15 Sep 2009 08:23:44 +0000 (09:23 +0100)]
x86: Remove PSE flag from PV guest CR4 and CPUID.

From: Dave McCracken <dcm@mccr.org>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agopygrub: Correct pygrub return value
Keir Fraser [Tue, 15 Sep 2009 08:21:34 +0000 (09:21 +0100)]
pygrub: Correct pygrub return value

This is the patch to correct pygrub return value for checkPassword()
function. It didn't return False at the end of the function. It
returned None so it was working fine and it's most likely just a
cosmetic issue.

Also, the missing () were added to checkPassword() function when
calling hasPassword and the unnecessary comment was removed.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agoxend: Receive error message of migration from destination server
Keir Fraser [Tue, 15 Sep 2009 08:20:47 +0000 (09:20 +0100)]
xend: Receive error message of migration from destination server

The following error message was shown by xm migrate command.
In fact, I caused the command error by intention.  I prepared a
destination server where free memory was insufficient, and then
I tried to migrate a VM to the destination server.  As I had
expected, the command error occurred.  However the error message
was different from my expectation.  I would like to show an error
message from the destination server if an error occurred on the
destination server.

# xm migrate --live vm3 bx339
Error: (107, 'Transport endpoint is not connected')
Usage: xm migrate <Domain> <Host>

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=3Dportnum, --port=3Dportnum
                     Use specified port for migration.
-n=3Dnodenum, --node=3Dnodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

If a destination server sends an error message, this patch shows=20
the error message.  For example, the following error message is=20
shown if free memory of the destination server is insufficient.

# xm migrate --live vm3 bx339
Error: I need 262144 KiB, but dom0_min_mem is 716800 and shrinking
to=20
716800 KiB would leave only 50368 KiB free. (from bx339)
Usage: xm migrate <Domain> <Host>

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=3Dportnum, --port=3Dportnum
                     Use specified port for migration.
-n=3Dnodenum, --node=3Dnodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>